Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update RecursiveArrayTools.jl compatibility to version 3 #2194

Open
wants to merge 39 commits into
base: main
Choose a base branch
from

Conversation

huiyuxie
Copy link
Member

@huiyuxie huiyuxie commented Dec 8, 2024

Supplementary fix for #2150 and JoshuaLampert/DispersiveShallowWater.jl#163 (merged).

Copy link
Contributor

github-actions bot commented Dec 8, 2024

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

  • The PR has a single goal that is clear from the PR title and/or description.
  • All code changes represent a single set of modifications that logically belong together.
  • No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

  • The code can be understood easily.
  • Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
  • There are no redundancies that can be removed by simple modularization/refactoring.
  • There are no leftover debug statements or commented code sections.
  • The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

  • New functions and types are documented with a docstring or top-level comment.
  • Relevant publications are referenced in docstrings (see example for formatting).
  • Inline comments are used to document longer or unusual code sections.
  • Comments describe intent ("why?") and not just functionality ("what?").
  • If the PR introduces a significant change or new feature, it is documented in NEWS.md with its PR number.

Testing

  • The PR passes all tests.
  • New or modified lines of code are covered by tests.
  • New or modified tests run in less then 10 seconds.

Performance

  • There are no type instabilities or memory allocations in performance-critical parts.
  • If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

  • The correctness of the code was verified using appropriate tests.
  • If new equations/methods are added, a convergence test has been run and the results
    are posted in the PR.

Created with ❤️ by the Trixi.jl community.

@huiyuxie
Copy link
Member Author

huiyuxie commented Dec 8, 2024

I make the initialization algorithm for DiscreteCallback default to nothing - if that is not what you intend to change with the new struct, this PR definitely won't help.

@huiyuxie huiyuxie requested a review from ranocha December 8, 2024 09:34
@huiyuxie
Copy link
Member Author

huiyuxie commented Dec 8, 2024

This will make the current package compatible with Julia >= 1.10 - the CI tests relates to Julia < 1.10 will definitely fail.

@huiyuxie
Copy link
Member Author

huiyuxie commented Dec 8, 2024

The dependency management is really a mess here - for example, the configurations for test env, docs env, and main project env never align. This definitely makes the debug for env configuration hard

@huiyuxie huiyuxie requested a review from sloede December 8, 2024 10:49
@huiyuxie
Copy link
Member Author

huiyuxie commented Dec 8, 2024

Review @ranocha @sloede

@sloede
Copy link
Member

sloede commented Dec 8, 2024

This will make the current package compatible with Julia >= 1.10

I don't think we are ready to do this just yet - or did we make a decision about this that I forgot about, @ranocha?

Copy link

codecov bot commented Dec 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.12%. Comparing base (c69fe96) to head (86d87a4).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2194      +/-   ##
==========================================
- Coverage   96.42%   90.12%   -6.30%     
==========================================
  Files         487      487              
  Lines       39352    39344       -8     
==========================================
- Hits        37942    35456    -2486     
- Misses       1410     3888    +2478     
Flag Coverage Δ
unittests 90.12% <100.00%> (-6.30%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@huiyuxie
Copy link
Member Author

huiyuxie commented Dec 9, 2024

Then the issue #1789 will be delayed - when do you plan to upgrade to Julia 1.10?

@sloede
Copy link
Member

sloede commented Dec 9, 2024

Then the issue #1789 will be delayed - when do you plan to upgrade to Julia 1.10?

Which version of RecursiveArrayTools.jl is really required? Just v3 (i.e., v3.0.0 would be sufficient) or a specific, later version? With, e.g., v3.2 we would still be able to keep Julia v1.9 compatibility.

@huiyuxie
Copy link
Member Author

huiyuxie commented Dec 9, 2024

Good question 👍let me check

@huiyuxie
Copy link
Member Author

huiyuxie commented Dec 9, 2024

Which version of RecursiveArrayTools.jl is really required?

I have no idea, I just know that >= 2.38.10 does not work and >= 3 works. Can you answer Michael's question @jlchan - I don't know which version you actually need

@ranocha
Copy link
Member

ranocha commented Dec 10, 2024

I would hesitate to require Julia 1.10 and newer. We have seen some performance impacts in HPC settings, e.g., JuliaLang/julia#55009, JuliaLang/julia#50985

@jlchan
Copy link
Contributor

jlchan commented Dec 10, 2024

Which version of RecursiveArrayTools.jl is really required?

I have no idea, I just know that >= 2.38.10 does not work and >= 3 works. Can you answer Michael's question @jlchan - I don't know which version you actually need

We'd need at least RecursiveArrayTools.jl 3.27.1, which is where @huiyuxie's PR fixing VectorOfArray was implemented.

@sloede
Copy link
Member

sloede commented Dec 10, 2024

Which version of RecursiveArrayTools.jl is really required?

I have no idea, I just know that >= 2.38.10 does not work and >= 3 works. Can you answer Michael's question @jlchan - I don't know which version you actually need

We'd need at least RecursiveArrayTools.jl 3.27.1, which is where @huiyuxie's PR fixing VectorOfArray was implemented.

Too bad. This is already at Julia v1.10 😞

@huiyuxie
Copy link
Member Author

I would hesitate to require Julia 1.10 and newer. We have seen some performance impacts in HPC settings, e.g., JuliaLang/julia#55009, JuliaLang/julia#50985

Does it mean you prefer to upgrade to version 1.10 until both of these issues are resolved?

@ranocha
Copy link
Member

ranocha commented Dec 11, 2024

That's something we need to discuss.

@JoshuaLampert
Copy link
Member

This is a bad situation. IMHO, we cannot wait until these two issues are resolved before we fix the incompatibility with newer versions of the whole SciML stack. Fixing this feels more and more urgent and it also looks like the two julia issues will not be fixed anytime soon.
Is there any chance to keep the compat bound as v1.8 for julia, but also allow old and new versions of RecursiveArrayTools.jl (something like `RecursiveArrayTools = "2.38.10, 3")? Then one would need to implement different behavior depending on the version of RecursiveArrayTools.jl, i.e. for RecursiveArrayTools.jl < v3.27.1 we use the current version and for RecursiveArrayTools.jl >= v3.27.1 we implement the new version. In that case for julia < v1.10 the old version would be used and for julia >= v1.10 the new one. However, if that is really practically viable is another question. Do you see any major problem that rules this solution out?

ranocha
ranocha previously approved these changes Jan 9, 2025
@ranocha
Copy link
Member

ranocha commented Jan 9, 2025

This was quite a dependency nightmare. Let's see whether CI passes this time...

@ranocha
Copy link
Member

ranocha commented Jan 9, 2025

There are some real CI failures, e.g., related to the signature of the ODE solution used for plotting: https://github.com/trixi-framework/Trixi.jl/actions/runs/12692326120/job/35377430185?pr=2194#step:7:1635

@ranocha
Copy link
Member

ranocha commented Jan 9, 2025

It looks like there is some serious type instability or something like that with MPI (based on the amount of allocations): https://github.com/trixi-framework/Trixi.jl/actions/runs/12692326120/job/35377432752?pr=2194#step:7:12200

@ranocha
Copy link
Member

ranocha commented Jan 9, 2025

I don't have the time to debug this further today. Any help is welcome 🙂

@jlchan
Copy link
Contributor

jlchan commented Jan 9, 2025

I don't have the time to debug this further today. Any help is welcome 🙂

I can work on the plotting but it probably won't be until tomorrow.

@huiyuxie
Copy link
Member Author

Thanks for helping @ranocha 😊

Project.toml Outdated Show resolved Hide resolved
Co-authored-by: Valentin Churavy <[email protected]>
@huiyuxie
Copy link
Member Author

Why did updating the dependency cause such a huge increase in memory allocation for MPI-related tests?

@ranocha
Copy link
Member

ranocha commented Jan 15, 2025

We need to remember that Static.jl is still at v0.8 in the tests: https://github.com/trixi-framework/Trixi.jl/actions/runs/12754930050/job/35549858370?pr=2194#step:7:315

@ranocha
Copy link
Member

ranocha commented Jan 15, 2025

The Makie visualization throws an error

  Got exception outside of a @test
  type Camera3D has no field attributes
  Stacktrace:
    [1] getproperty(x::Makie.Camera3D, f::Symbol)
      @ Base ./Base.jl:37
    [2] iplot(pd::Trixi.PlotData2DTriangulated{StructArrays.StructArray{StaticArraysCore.SVector{4, Float64}, 2, NTuple{4, Matrix{Float64}}, Int64}, Matrix{Float64}, Matrix{Float64}, StructArrays.StructArray{StaticArraysCore.SVector{4, Float64}, 2, NTuple{4, Matrix{Float64}}, Int64}, StaticArraysCore.SVector{4, String}, Matrix{Int32}}; plot_mesh::Bool, show_axis::Bool, colormap::Symbol, variable_to_plot_in::Int64)
      @ TrixiMakieExt ~/work/Trixi.jl/Trixi.jl/ext/TrixiMakieExt.jl:260

See https://github.com/trixi-framework/Trixi.jl/actions/runs/12783728860/job/35635370257?pr=2194#step:7:3861

@ranocha
Copy link
Member

ranocha commented Jan 15, 2025

Based on the results

MPI                                                                  |   76     8     84  23m41.3s
  TreeMesh MPI                                                       |   20           20   6m17.3s
  P4estMesh MPI 2D                                                   |   14           14   2m21.1s
  T8codeMesh MPI 2D                                                  |   14           14   1m41.7s
  P4estMesh MPI 3D                                                   |   17     5     22   8m34.6s
    Examples 3D                                                      |   17     5     22   8m34.6s
      elixir_advection_basic.jl                                      |    2            2   1m06.6s
      elixir_advection_amr.jl                                        |    1     1      2     56.7s
      elixir_advection_amr_unstructured_curved.jl                    |    1     1      2   1m12.4s
      elixir_advection_restart.jl                                    |    2            2      1.7s
      elixir_advection_cubed_sphere.jl                               |    2            2      6.4s
      elixir_euler_source_terms_nonconforming_unstructured_curved.jl |    1     1      2     49.3s
      elixir_euler_source_terms_nonperiodic.jl                       |    2            2     42.0s
      elixir_euler_ec.jl                                             |    2            2     59.6s
      elixir_euler_source_terms_nonperiodic_hohqmesh.jl              |    2            2     46.7s
      elixir_mhd_alfven_wave_nonconforming.jl                        |    1     1      2   1m45.7s
  T8codeMesh MPI 3D                                                  |   11     3     14   4m45.1s
    Examples 3D                                                      |   11     3     14   4m45.1s
      elixir_advection_basic.jl                                      |    2            2     45.1s
      elixir_advection_amr.jl                                        |    1     1      2     50.9s
      elixir_advection_amr_unstructured_curved.jl                    |    1     1      2     54.4s
      elixir_advection_restart.jl                                    |    2            2      1.6s
      elixir_euler_source_terms_nonconforming_unstructured_curved.jl |    1     1      2     42.9s
      elixir_euler_source_terms_nonperiodic.jl                       |    2            2     34.2s
      elixir_euler_ec.jl                                             |    2            2     52.3s

I guess the issue is somewhere in the MPI mortars with p4est/t8code meshes..

@benegee
Copy link
Contributor

benegee commented Jan 15, 2025

The Makie visualization throws an error

Could be Makie.cameracontrols(ax.scene).controls.up_key now.

@ranocha
Copy link
Member

ranocha commented Jan 15, 2025

Could you please check that and push it to this branch?

@benegee
Copy link
Contributor

benegee commented Jan 16, 2025

So far I found that running this example

module TestAllocations
using Trixi
trixi_include(@__MODULE__,
              joinpath(pwd(), "../examples/t8code_3d_dgsem/elixir_advection_amr.jl"))
@unpack mesh, equations, solver, cache = semi
@show @allocated Trixi.start_mpi_send!(cache.mpi_cache, mesh, equations, solver, cache)
end

on 2 MPI ranks results in O(1e6) @allocated output on one of the ranks, while the other has O(1000). When running with 1 rank only, or when running with julia 1.11, or when using another example like elixir_advection_basic.jl, all values are in the O(1000) range.

@ranocha
Copy link
Member

ranocha commented Jan 17, 2025

@vchuravy Does the result from @benegee trigger an idea of what's wrong here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants